Robustness of Optimality of Exploration Ratio against Agent Population in Multiagent Learning for Nonstationary Environments

ثبت نشده
چکیده

In this article, I show the robustness of optimality of exploration ratio against the number of agents (agent population) under multiagent learning (MAL) situation in nonstationary environments. Agent population will affect efficiency of agents’ learning because each agent’s learning causes noisy factors for others. From this point, exploration ratio should be small to make MAL effective. In nonstationary environments, on the other hand, each agent needs explore with enough probability to catch-up changes of the environments. This means the exploration ratio need to be significantly large. I investigate the relation between the population and the efficiency of exploration based on a theorem about relations between the exploration ratio and a lower boundary of learning error. Finally, it is shown that the population of the agents does not affect the optimal exploration ratio under a certain condition. This consequence is confirmed by several experiments using population games with various reward functions. Introduction Exploration is an indispensable behavior for learning agents especially under nonstationary environments. The agent needs to explore in a certain ratio (exploration ratio) permanently to catch up changes of the environment. On the other hand, in multi-agent learning (MAL) situation, exploration of an agent causes noise to other agents. So, agents need to keep the exploration ratio as small as possible to help others to learn. So, there is a trade-off problem of choosing the “large-or-small” exploration ratio in MAL under nonstationary environments. Focusing on real-world problems, we can find several applications of MAL under nonstationary environments. Resource allocations like traffic managements and smart-grid controls are typical problems of such applications. One of difficulties in such applications is open-ness, by which the environments may increase or decrease available resources continuously. Also, the population of agents may change over time. In order to handle such open-ness, we need to develop a method to choose suitable behavior parameters of agents like exploration ratio. And, as the first step to establish the method, we need to know relation among such paCopyright c 2014, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. rameters (especially, the exploration ratio), properties of the environments, and learning performance of agents. Related Works Choosing and controlling the exploration ratio has been studied mainly for stationary but noisy environments or for single learning agents (Zhang and Pan 2006; MartinezCantin et al. 2009; Rejeb, Guessoum, and M’Hallah 2005; Tokic 2010; Reddy and Veloso 2011). The most of these works focused on relation between efficiency of total performance of agents and learning speed in the balance of exploration and exploitation. No-regret learning also provides a means for agents to learn and reach equilibrium in action-selection for multiagent and probabilistic environments (Gordon, Greenwald, and Marks 2008; Hu and Wellman 1998; Jafari et al. 2001; Greenwald and Jafari 2003). However, the most of these studies assume that the environments are stationary so that learning ends when agents reach equilibrium. Minority games and its dynamical variations has been studies by (Challet and Zhang 1998; Catteeuw and Manderick 2011). They investigate the case of stationary environments and try to find relations among parameters and agent performances. For nonstationary setting, (Galstyan and Lerman 2002; Galstyan and andKristina Lerman 2003) investigate numerical analysis of behaviors of agents to changing resource capacities. For MAL and nonstationary environments, (Noda 2013) proposed a formalization based on the concept of advantageous probabilities, and derived a theorem about the lower boundary of learning error for a given exploration ratio. In this article, I follow the result of this work, and investigate what factors in MAL will affect the optimal value of the exploration ratio under a kind of resource sharing problem called population games. Formalization and Theorems This section will provide a formalization of MAL in nonstationary environments. Population Game In this article, we focus on a set of simplified games called population games (PGs) in which multiple agents play and learn to make decisions. Multiagent Interaction without Prior Coordination: Papers from the AAAI-14 Workshop Itsuki Noda National Institute of Advanced Industrial Science and Technology 1-1-1 Umezono, Tsukuba, Ibaraki, JAPAN JST CREST, Tokyo Institute of Technology [email protected]

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Convergence, Targeted Optimality, and Safety in Multiagent Learning

This paper introduces a novel multiagent learning algorithm, Convergence with Model Learning and Safety (or CMLeS in short), which achieves convergence, targeted optimality against memory-bounded adversaries, and safety, in arbitrary repeated games. The most novel aspect of CMLeS is the manner in which it guarantees (in a PAC sense) targeted optimality against memory-bounded adversaries, via ef...

متن کامل

A Multiagent Reinforcement Learning algorithm to solve the Community Detection Problem

Community detection is a challenging optimization problem that consists of searching for communities that belong to a network under the assumption that the nodes of the same community share properties that enable the detection of new characteristics or functional relationships in the network. Although there are many algorithms developed for community detection, most of them are unsuitable when ...

متن کامل

MB-AIM-FSI: a model based framework for exploiting gradient ascent multiagent learners in strategic interactions

Future agent applications will increasingly represent human users autonomously or semi-autonomously in strategic interactions with similar entities. Hence, there is a growing need to develop algorithmic approaches that can learn to recognize commonalities in opponent strategies and exploit such commonalities to improve strategic response. Recently a framework [9] has been proposed that aims for...

متن کامل

A Behavior-based Approach for Multi-agent Q-learning for Autonomous Exploration

The use of mobile robots is being popular over the world mainly for autonomous explorations in hazardous/ toxic or unknown environments. This exploration will be more effective and efficient if the explorations in unknown environment can be aided with the learning from past experiences. Currently reinforcement learning is getting more acceptances for implementing learning in robots from the sys...

متن کامل

Solving the flexible job shop problem by hybrid metaheuristics-based multiagent model

The flexible job shop scheduling problem (FJSP) is a generalization of the classical job shop scheduling problem that allows to process operations on one machine out of a set of alternative machines. The FJSP is an NP-hard problem consisting of two sub-problems, which are the assignment and the scheduling problems. In this paper, we propose how to solve the FJSP by hybrid metaheuristics-based c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014